A Framework for Effectively Integrating Hard and Soft Syntactic Rules into Phrase Based Translation

نویسندگان

  • Jiajun Zhang
  • Chengqing Zong
چکیده

In adding syntactic knowledge into phrase-based translation, using hard or soft syntactic rules to reorder the source-language aiming to closely approximate the targetlanguage word order has been successful in improving translation quality. However, it suffers from propagating the pre-reordering errors to the later translation step (decoding). In this paper, we propose a novel framework to integrate hard and soft syntactic rules into phrase-based translation more effectively. For a source sentence to be translated, hard or soft syntactic rules are first acquired from the source parse tree prior to translation, and then instead of reordering the source sentence directly, the rules are used as a strong feature integrated into our elaborately designed model to help phrase reordering in the decoding stage. The experiments on NIST Chinese-to-English translation show that our approach, whether incorporating hard or soft rules, significantly outperforms the previous methods.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A unified approach for effectively integrating source-side syntactic reordering rules into phrase-based translation

Phrase-based translation models, with sequences of words (phrases) as translation units, achieve state-of-the-art translation performance. However, phrase reordering is a major challenge for this model. Recently, researchers have focused on utilizing syntax to improve phrase reordering. In adding syntactic knowledge into phrase reordering model, using handcrafted or probabilistic syntactic rule...

متن کامل

A Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation

This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: 1) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. We develop novel features based on b...

متن کامل

مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی

Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...

متن کامل

Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

Title of dissertation: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models Yuval Marton, Doctor of Philosophy, 2009 Dissertation directed by: Professor Philip Resnik, Department of Linguistics and Institute for Advanced Computer Studies This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with lingu...

متن کامل

Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

This paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach k...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009